56 research outputs found

    CoreTSAR: Task Scheduling for Accelerator-aware Runtimes

    Get PDF
    Heterogeneous supercomputers that incorporate computational accelerators such as GPUs are increasingly popular due to their high peak performance, energy efficiency and comparatively low cost. Unfortunately, the programming models and frameworks designed to extract performance from all computational units still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP improves this situation by supporting natural migration of OpenMP code from CPUs to a GPU. However, these implementations currently lose one of OpenMP’s best features, its flexibility: typical OpenMP applications can run on any number of CPUs. GPU implementations do not transparently employ multiple GPUs on a node or a mix of GPUs and CPUs. To address these shortcomings, we present CoreTSAR, our runtime library for dynamically scheduling tasks across heterogeneous resources, and propose straightforward extensions that incorporate this functionality into Accelerated OpenMP. We show that our approach can provide nearly linear speedup to four GPUs over only using CPUs or one GPU while increasing the overall flexibility of Accelerated OpenMP

    Power efficient job scheduling by predicting the impact of processor manufacturing variability

    Get PDF
    Modern CPUs suffer from performance and power consumption variability due to the manufacturing process. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations and wasted power. In order to avoid such negative impact, users and system administrators must actively counteract any manufacturing variability. In this work we show that parallel systems benefit from taking into account the consequences of manufacturing variability when making scheduling decisions at the job scheduler level. We also show that it is possible to predict the impact of this variability on specific applications by using variability-aware power prediction models. Based on these power models, we propose two job scheduling policies that consider the effects of manufacturing variability for each application and that ensure that power consumption stays under a system-wide power budget. We evaluate our policies under different power budgets and traffic scenarios, consisting of both single- and multi-node parallel applications, utilizing up to 4096 cores in total. We demonstrate that they decrease job turnaround time, compared to contemporary scheduling policies used on production clusters, up to 31% while saving up to 5.5% energy.Postprint (author's final draft

    Parallelizing Heavyweight Debugging Tools with MPIecho *

    Get PDF
    ABSTRACT Idioms created for debugging execution on single processors and multicore systems have been successfully scaled to thousands of processors, but there is little hope that this class of techniques can continue to be scaled out to tens of millions of cores. In order to allow development of more scalable debugging idioms we introduce MPIecho, a novel runtime platform that enables cloning of MPI ranks. Given identical execution on each clone, we then show how heavyweight debugging approaches can be parallelized, reducing their overhead to a fraction of the serialized case. We also show how this platform can be useful in isolating the source of hardwarebased nondeterministic behavior and provide a case study based on a recent processor bug at LLNL. While total overhead will depend on the individual tool, we show that the platform itself contributes little: 512x tool parallelization incurs at worst 2x overhead across the NAS Parallel benchmarks, hardware fault isolation contributes at worst an additional 44% overhead. Finally, we show how MPIecho can lead to near-linear reduction in overhead when combined with Maid, a heavyweight memory tracking tool provided with Intel's Pin platform. We demonstrate overhead reduction from 1, 466% to 53% and from 740% to 14% for cg.D.64 and lu.D.64, respectively, using only an additional 64 cores

    Movements of marine fish and decapod crustaceans: Process, theory and application

    Get PDF
    Many marine species have a multi-phase ontogeny, with each phase usually associated with a spatially and temporally discrete set of movements. For many fish and decapod crustaceans that live inshore, a tri-phasic life cycle is widespread, involving: (1) the movement of planktonic eggs and larvae to nursery areas; (2) a range of routine shelter and foraging movements that maintain a home range; and (3) spawning migrations away from the home range to close the life cycle. Additional complexity is found in migrations that are not for the purpose of spawning and movements that result in a relocation of the home range of an individual that cannot be defined as an ontogenetic shift. Tracking and tagging studies confirm that life cycle movements occur across a wide range of spatial and temporal scales. This dynamic multi-scale complexity presents a significant problem in selecting appropriate scales for studying highly mobile marine animals. We address this problem by first comprehensively reviewing the movement patterns of fish and decapod crustaceans that use inshore areas and present a synthesis of life cycle strategies, together with five categories of movement. We then examine the scale-related limitations of traditional approaches to studies of animal-environment relationships. We demonstrate that studies of marine animals have rarely been undertaken at scales appropriate to the way animals use their environment and argue that future studies must incorporate animal movement into the design of sampling strategies. A major limitation of many studies is that they have focused on: (1) a single scale for animals that respond to their environment at multiple scales or (2) a single habitat type for animals that use multiple habitat types. We develop a hierarchical conceptual framework that deals with the problem of scale and environmental heterogeneity and we offer a new definition of 'habitat' from an organism-based perspective. To demonstrate that the conceptual framework can be applied, we explore the range of tools that are currently available for both measuring animal movement patterns and for mapping and quantifying marine environments at multiple scales. The application of a hierarchical approach, together with the coordinated integration of spatial technologies offers an unprecedented opportunity for researchers to tackle a range of animal-environment questions for highly mobile marine animals. Without scale-explicit information on animal movements many marine conservation and resource management strategies are less likely to achieve their primary objectives

    Power-Bounded HPC Performance Optimization (Dagstuhl Perspectives Workshop 15342)

    No full text
    This report documents the program and the outcomes of Dagstuhl Perspectives Workshop 15342 "Power-Bounded HPC Performance Optimization". The workshop consists of two parts. In part one, our international panel of experts in facilities, schedulers, runtime systems, operating systems, processor architectures and applications provided thought-provoking and details insights into open problems in each of their fields with respect to the workshop topic. These problems must be resolved in order to achieve a useful power-constrainted exascale system, which operates at the highest performance within a given power bound. In part two, the participants split up in three groups, trying to address certain specific subtopics as identified during the expert plenaries. These subtopics have been discussed in more detail, followed by plenary sessions to compare and synthesize the findings into an overall picture. As a result, the workshop identified three major problems, which need to be solved on the way to power-bounded HPC performance optimization

    Theory and practice of dynamic voltage /frequency scaling in the high performance computing environment

    No full text
    This dissertation provides a comprehensive overview of the theory and practice of Dynamic Voltage/Frequency Scaling (DVFS) in the High Performance Computing (HPC) environment. We summarize the overall problem as follows: how can the same level of computational performance be achieved using less electrical power? Equivalently, how can computational performance be increased using the same amount of electrical power? In this dissertation we present performance and architecture models of DVFS as well as the Adagio runtime system. The performance model recasts the question as an optimization problem that we solve using linear programming, thus establishing a bound on potential energy savings. The architectural model provides a low-level explanation of how memory bus and CPU clock frequencies interact to determine execution time. Using insights provided from these models, we have designed and implemented the Adagio runtime system. This system realizes near-optimal energy savings on real-world scientific applications without the use of training runs or source code modification, and under the constraint that only negligible delay will be tolerated by the user. This work has opened up several new avenues of research, and we conclude by enumerating these

    Applying High-Performance Computing to Multi-Area Stochastic Unit Commitment for Renewable Energy Integration

    No full text
    We present a parallel implementation of Lagrangian relaxation for solving stochastic unit commitment subject to uncertainty in renewable power supply and generator and trans- mission line failures. We describe a scenario selection algorithm inspired by importance sampling in order to formulate the stochastic unit commitment problem and validate its performance by comparing it to a stochastic formulation with a very large number of scenarios, that we are able to solve through parallelization. We examine the impact of narrowing the duality gap on the performance of stochastic unit commitment and compare it to the impact of increasing the number of scenarios in the model. We report results on the running time of the model and discuss the applicability of the method in an operational setting
    • …
    corecore